Term conflation methods in information retrieval: Non-linguistic and linguistic approaches
نویسندگان
چکیده
Purpose – To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach – Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings – The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value – Outlines the importance of FSTs for the normalization of term variants.
منابع مشابه
Applying Productive Derivational Morphology to Term Indexing of Spanish Texts
This paper deals with the application of natural language processing techniques to the field of information retrieval. To be precise, we propose the application of morphological families for single term conflation in order to reduce the linguistic variety of indexed documents written in Spanish. A system for automatic generation of morphological families by means of Productive Derivational Morp...
متن کاملA model of an information retrieval system with unbalanced fuzzy linguistic information
Most information retrieval systems based on linguistic approaches use symmetrically and uniformly distributed linguistic term sets to express the weights of queries and the relevance degrees of documents. However, to improve the system–user interaction, it seems more adequate to express these linguistic weights and degrees by means of unbalanced linguistic scales, that is, linguistic term sets ...
متن کاملA Model of Information Retrieval System with Unbalanced Fuzzy Linguistic Information
Most information retrieval systems based on linguistic approaches use symmetrically and uniformly distributed linguistic term sets to express the weights of queries and the relevance degrees of documents. However, to improve the system-user interaction it seems more adequate to express these linguistic weights and degrees by means of unbalanced linguistic scales, i.e., linguistic term sets with...
متن کاملAcronyms as an Integral Part of Multi-Word Term Recognition - A Token of Appreciation
Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain–specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multi–word term...
متن کاملArithmetic Aggregation Operators for Interval-valued Intuitionistic Linguistic Variables and Application to Multi-attribute Group Decision Making
The intuitionistic linguistic set (ILS) is an extension of linguisitc variable. To overcome the drawback of using single real number to represent membership degree and non-membership degree for ILS, the concept of interval-valued intuitionistic linguistic set (IVILS) is introduced through representing the membership degree and non-membership degree with intervals for ILS in this paper. The oper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Documentation
دوره 61 شماره
صفحات -
تاریخ انتشار 2005